A subsampled double bootstrap for massive data
نویسندگان
چکیده
The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (Bag of Little Bootstraps) for massive data which is more computationally scalable with little sacrifice of statistical accuracy. Building on BLB and the idea of fast double bootstrap, we propose a new resampling method, the subsampled double bootstrap, for both independent data and time series data. We establish consistency of the subsampled double bootstrap under mild conditions for both independent and dependent cases. Methodologically, the subsampled double bootstrap is superior to BLB in terms of running time, more sample coverage and automatic implementation with less tuning parameters for a given time budget. Its advantage relative to BLB and bootstrap is also demonstrated in numerical simulations and a data illustration.
منابع مشابه
SFB 823 A subsampled double bootstrap for massive data
The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its feasibility is questionable even with modern parallel computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BL...
متن کاملA scalable bootstrap for massive data
The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets—which are increasingly prevalent— the computation of bootstrap-based quantities can be prohibitively demanding computationally. While variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computation...
متن کاملImproving the reliability of bootstrap tests with the fast double bootstrap
Two procedures are proposed for estimating the rejection probabilities of bootstrap tests in Monte Carlo experiments without actually computing a bootstrap test for each replication. These procedures are only about twice as expensive (per replication) as estimating rejection probabilities for asymptotic tests. Then a new procedure is proposed for computing bootstrap P values that will often be ...
متن کاملInferential Procedures Based on the Double Bootstrap for Log Logistic Regression Model with Censored Data
Traditional inferential procedures based on the asymptotic normality assumption such as the Wald often produce misleading inferences when dealing with censored data and small samples. Alternative estimation techniques such as the jackknife and bootstrap percentile allow us to construct the interval estimates without relying on any classical assumptions. Recently, the double bootstrap became pre...
متن کاملComputational algorithms for double bootstrap confidence intervals
In some cases, such as in the estimation of impulse responses, it has been found that for plausible sample sizes the coverage accuracy of single bootstrap confidence intervals can be poor. The error in the coverage probability of single bootstrap confidence intervals may be reduced by the use of double bootstrap confidence intervals. The computer resources required for double bootstrap confiden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015